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Abstract 



Motivated by an investigation of ground state properties of randomly charged 
polymers, we discuss the size distribution of the largest Q-segments (segments 
with total charge Q) in such iV— mers. Upon mapping the charge sequence to 
one-dimensional random walks (RWs), this corresponds to finding the prob- 
ability for the largest segment with total displacement Q in an TV-step RW 
to have length L. Using analytical, exact enumeration, and Monte Carlo 
methods, we reveal the complex structure of the probability distribution in 
the large N limit. In particular, the size of the longest neutral segment has a 
distribution with a square-root singularity at^ = L/iV = l,an essential singu- 
larity at £ = 0, and a discontinuous derivative at I = 1/2. The behavior near 
i = 1 is related to a another interesting RW problem which we call the "stair- 
case problem". We also discuss the generalized problem for cf-dimensional 
RWs. 
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I. INTRODUCTION 



The importance of understanding proteins |IJ has attracted much attention to the statis- 
tical mechanics of heterogeneous polymers. A particular type of heteropolymers built with a 
random mixture of positively and negatively charged groups along their backbone are called 
polyampholytes (PAs). The presence of long range electrostatic interactions causes a rather 
unique behavior in such polymers: the behavior of a single PA with unscreened electrostatic 
interactions at a low temperature T is extremely sensitive to its total (excess) charge Q D . 
Geometrical properties of polymers can be conveniently described by their radius of gyration 
(root-mean-squared size) R g 0. At high T, the effect of electrostatic interactions is small 
and R g is approximately equal to that of an uncharged polymer. However, upon lowering 
of T the PA attempts to take advantage of the presence of two types of charges along its 
backbone by assuming spatial conformations in which every charge is predominantly sur- 
rounded by charges of an opposite sign. This behavior can be approximately described using 
a Debye-Huckel-type theory |3|], which leads to the conclusion that at low T the polymer 
should collapse into an dense state with condensation energy -E con d ~ ~Nq 2 /a, where N 
is the number monomers, q D is the typical charge of a monomer, and a is a microscopic 
distance such as diameter of the monomer. In such a collapsed state, R g ~ N 1 / 3 . On the 
other hand, renormalization group inspired scaling arguments showed j3J that at low T one 
should expect a strongly stretched state with R g ~ N. This apparent contradiction was 
resolved by noting || that the low-T behavior is extremely sensitive to the overall charge 
Q \ It has been observed that randomly charged PAs with vanishing Q Q indeed collapse 
at low T, while R g , which is averaged over unrestricted quenches, grows with decreasing T. 
Such sensitivity is consistent with experimental observations of PAs |J . 

From a detailed study of the Q D -dependence of R g , the following picture began emerging 
p|j8|] : Consider a dense (globular, approximately spherical) low-T state of the PA. Its energy 
can be roughly separated into three terms, as 

E = -N^ + 1 S + Q 2 jR g . (1) 

(In this description we omit the dimensionless prefactors of order unity.) The first term 
in this equation represents the Debye-Hiickel-type condensation energy, the second term is 
the surface energy (where the surface tension 7 m q 2 /a 3 , and the surface area S ~ a 2 N 2 ^ 3 ), 
while the last term is the electrostatic energy of the globule of radius R g ~ aN 1 / 3 . For 
vanishing Q Q , the globule remains approximately spherical. However, when Q > Qr « 
qoN 1 / 2 , the electrostatic term exceeds the surface tension term, the spherical shape becomes 
unstable and the polymer starts to stretch in order to minimize the electrostatic energy. 
Since the threshold charge Qr increases with N exactly as the standard deviation of the 
total charge Q D in a random sequence of charges, for any N there will be a finite portion 
of chains with Q exceeding Qr. (Note that this property is specific to three-dimensional 
electrostatic interactions. For the iV-dependence of Qr in general space dimensions, see 
Ref. ||.) While the above arguments suggest that a typical PA should stretch out at low 
T, such stretching may lead to a loss of the condensation energy. A reasonable compromise 
between stretching (which minimizes the electrostatic energy) and remaining compact (which 
gains in condensation energy) is for the PA to form a necklace of weakly charged blobs 
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connected with highly charged "necks" , by taking advantage of the charge fluctuations along 
the chain. The results of the Monte Carlo f?J and exact enumeration || studies qualitatively 
support such a picture. An example of such a low energy configuration is shown in Fig. [I]. 

While the exact treatment of electrostatic interactions is not possible, we can pose a 
simplified problem which, we hope, captures some essential features of this necklace model. 
For example, we may ask what the typical size of the largest neutral (or weakly charged) 
segment in a random sequence of iV charges will be. In order to answer this question, we 
investigated the size distribution of the largest Q— segments (segments with a total charge 
Q) in such N— mers. This problem can be mapped to a one-dimensional random walk (RW): 
the sequence of charges {qi} (i — 1, . . . , N; g € = ±1), is mapped into a sequence of unit 
steps in the positive or negative directions along an axis. The sequence of charges with 
vanishing total charge Q Q now corresponds to a RW which returns to the origin after N 
steps, while a neutral segment inside the sequence of charges corresponds to a loop inside 
the RW. Similarly, a segment with charge Q corresponds to a segment (in the corresponding 
RW) whose end is displaced by Q units from its beginning. The primary objective of this 
work is to investigate the probability Pn(L,Q) that the largest Q-segment in an iV-step 
RW has length L. 

There is an apparent simplicity of the formulation of the problem, i.e. it is similar (and 
related) to the classical RW problems ||, such as the problem of first passage times or 
the problem of last return to the starting point, for which probability distributions can be 
computed exactly by using the method of reflections |K| . However, the search for the longest 
segment of the RW, among all possible starting points, creates a more complicated problem. 

Some of the results presented in this paper have been briefly reported before [JXTJ] . In this 
work we present a complete exposition of those results, as well as many new results related 
to this problem and its generalized version. In Section [TT] we define the problem accurately 
and argue that in the large N limit it can be described in terms of a probability density 
p(£, g), where £ = L/N and q = Q/yN are the reduced length and charge, respectively. This 
probability density is investigated using Monte Carlo (MC) and exact enumeration methods, 
as well as by analytical arguments. In particular, we show that the function p(£, 0) has an 



essential singularity in the £ — ► limit, and diverges as 1/y (1 — £) in the limit I — > 1. These 
properties can be easily understood from qualitative arguments presented in Section [Til]. In 
Section |IV| we construct an exact integral expression which enables an analytic investigation 
of certain properties of p(£,q). In Section [V] we show that our problem is related to a 
different problem of two random walkers (which we call the "staircase problem"). This 
relation enables us to use the latter problem to investigate the behavior of p(£, 0) in the 
limit £ — > 1. While some of the properties of p(£,q) can be deduced analytically, we had 
to complement our results by MC and exact enumeration studies, which appear in almost 
every section of the paper along with analytical arguments on the subject. 

An additional insight into the problem can be gained by considering its generalization 
to (i-dimensional RWs. (This generalization is not related to the original problem of PAs 
or to their embedding dimension.) In this generalization, which is described in Sec. [VI|, 
Q is treated as a d-dimensional vector rather than a scalar. As in the one-dimensional 
case, Q = corresponds to a loop in the RW. Since the generalized problem investigates 
the presence of large loops, it is somewhat related to the problem of self-avoiding walks 
0, whose behavior is also controlled by self-intersections (i.e. loops). In particular, the 
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probability distribution of the Q-segments becomes trivial for d > 4, when large loops are 
virtually absent. 



II. EXTREMAL SEGMENTS: DEFINITIONS AND MAIN PROPERTIES 

In this Section, we present an exact definition of the problem of extremal segments of 
a one dimensional sequence and review the qualitative features of the resulting probability 
distributions. 

Consider the set fl^ which contains all A-element sequences {g^} (i — 1, . . . , N; qi — ±1). 
Here, qi physically corresponds to the charge (positive or negative) on the ith monomer of 
the iV-mer. Alternatively, it can be thought of as the direction of the ith step of an N- 
step one-dimensional RW. A randomly charged polymer (or, alternatively, RW) can then be 
represented as a random sequence (RS) u £ Q N picked with equal probability 2~ N . Fig. || 
depicts an example of such a sequence and the corresponding path, where the position 
Si(uj) = Z)}=i qj of the path at index i gives the accumulated charge from the beginning of 
the polymer till the ith monomer. (Sq(u) = 0.) In the language of the RWs, Si is simply 
the displacement of the walk from the origin after i steps. Every segment of the sequence 
between, say, steps i and j, has a certain charge Qijiuj) = Sj(uj) — Si(u). A segment for 
which Qij(u) = Q will be called a Q-segment. Given a randomly chosen sequence uj G VL n 
and a charge Q, let Pn(L,Q) denote the probability that the largest Q-segment in uj has 
length L. It should be stressed that the definition refers to the largest Q-segment among 
many possible Q-segments with different starting points which may exist in uj. For example, 
the dotted lines in Fig. |2] indicate the longest 0-segments (L = 18) and the dot-dashed lines 
show the longest 4-segments (L = 22) in a sequence with N = 24 [O]. Clearly, the longest 
Q-segment does not have to be unique. If there is at least one Q-segment in the sequence 
then its length L satisfies < L < N. From the definitions is clear that the 0-segment 
is always present and therefore J2l=o Pn{L, 0) = 1. However, the set of Q-segments in a 
given sequence may be empty for \Q\ > 0: For example, the sequence shown in Fig. |2] has 
no 8-segments. Thus, El=o p n(L,Q) < 1 for \Q\ > 0. 

Most properties of RSs have simple continuum limits. We demonstrate this in Sections 
[H] and by discussing RW problems that are exactly solvable, and relating them to the 



behavior of Pn(L, Q) in certain limits. Thus, we also expect Pjy(L, Q) to approach a similar 
scaling form when N,L,Q — > oo, while the reduced length i = L/N and the reduced charge 
q = Q/yN are kept constant. In this continuum limit, it is more convenient to work with 
the probability density 

N 

p{£, q) = - [P N (L, Q) + P N {L + 1, Q)\ . (2) 

Of course, for small N, this definition of p(£, q) will still depend on N. We expect it to 
become a function only of the reduced variables in the N,L,Q — > oo limit. Note that at 
least one term the the square brackets of Eq. (|2]) vanishes since Pn(L, Q) = for odd L + Q. 
To prevent even-odd oscillations, we included two terms in the definition of p, as in the 
definitions which are used in continuum limits of discrete RWs. 

We have initially examined the behavior of Pn{L, Q) using numerical (exact enumeration 
and Monte Carlo) methods, details of which are given in the Appendix. Monte Carlo results 
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obtained for a variety of large Ns up to N = 10 4 were virtually indistinguishable from each 
other when plotted in the properly scaled variables. The results for N = 1000 are depicted 
as a solid curve in each one of the graphs in Fig. [| For that particular value of N we 
evaluated the probability density from 10 8 randomly selected sequences. For short chains 
(up to iV = 36) it was possible to perform a complete enumeration and get the exact results 
for Pn(L, Q). When these exact results are plotted in the scaled form, as presented in Fig. |3|, 
we can see that even for such modest values of N, there is an extremely fast convergence to 
the continuum distribution p(£, 0), depicted by the solid curve (especially for £ > 0.5). 
The probability density p(£, 0) shown in Fig. |3] has several remarkable properties: 

(a) MC results show that p at £ = | is very close to unity (1.004 ± 0.006). At that point 
the slope of the curve changes by an order of magnitude. While it is impossible to ascertain 
from the numerical results that there is actually a discontinuity in the first derivative of 
p(£, 0) with respect to £, both the MC results and analytical arguments indicate that £ = \ 
is a very special point of the curve. 

(b) For £ — > 0, the function exhibits an essential singularity of the form ~ £~ x exp(— B/£), 
where B « 1.7 and x ~ 1.5 - 2. The estimates of the coefficient B and of the exponent x 
have been obtained from the MC data. However, in the £ — ► limit we are dealing with 
almost vanishing probabilities, and therefore the statistical accuracy is small. Thus the 
estimates depend on the precise range of £s for which the fit is performed. Nevertheless, 
the existence of the singularity can be easily understood from the fact that for small £ the 
absence of large loops in the entire chain can be though of as requirement that such loops 
are absent in many separate and independent segments of the sequence. In the Section [II I 
this argument will be discussed in detail. 



(c) For i^l, p(£,0) diverges as A/y/ir(l -£), with A = 1.008 ±0.005. This estimate of the 
constant A has been obtained from MC results for iV = 1000 sequence. In Section |IV| we 
prove the existence of the square-root singularity from an integral relation which is derived 
for p(£,q). The proof, however, does not provide a value for the prefactor A, and we are 
limited to MC estimates, as well as results extracted from exact enumeration studies which 
will be presented in Sec. |V[ (The accumulated evidence of MC and exact enumeration 
shows that A is definitely larger than 1.) Some more intuitive, although less rigorous, 
results regarding the £ — > and I — > 1 limits are presented in Section [Til]. The exact 
enumeration results depicted in Fig. ^| are not suitable for extraction of asymptotic behavior, 
since the Ns are too small. In the Section ^ we show that it is possible to exactly calculate 
Pn{L = N — M,Q) for small M (i.e. M = 0,2,4) and arbitrary sequence length N. In 
principle, the correct behavior of p(£, 0) in the £ — > 1 limit can be deduced from the exact 
values of P N (L = N - M, 0) only if the limit N, M -> oo (while keeping M/N = 1 - £ 
constant) is taken before the £ — > 1 limit. Somewhat surprisingly, if we attempt to match 
the asymptotic form of p{£, 0) near £ = 1 with Pn{L,0) for L = N — 2, we find A — 1, 
i.e. we reproduce almost the exact value of the prefactor. Thus, the discrete distribution 
approaches its asymptotic (continuum) form within a few steps of the extreme L = N. 

Consider next the full probability density p(£, q), which is depicted in Fig. £|. Introduction 
of an additional variable q significantly increased the CPU time needed to analyze a single 
RS. The MC data in this figure represent only 10 7 sequences of length N = 1024, i.e. 
its accuracy is smaller than the MC results depicted by the solid line in fig. ^. Fig. |] 
demonstrates further peculiarities of p(£, q) : For fixed £, the g-dependence of p is qualitatively 
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different for £ > | and £ < \: When I > ~, the distribution has a single peak at q = 0, 
which approaches a Gaussian shape as £ increases, while for I < | we see a minimum at 
g = and two peaks symmetrically located around the minimum. While qualitatively such 
behavior can be easily understood (e.g. for small I the 0-segments are very unprobable, 
since they are typically large, and consequently the maximum must be reached for non-zero 
value of q) the transition between the £ < | and £> \ regions is rather sharp: we analyzed 
the g-dependence of the graphs representing the fixed-£ sections of Fig. ^ and concluded 
that the transition from single maximum to a minimum surrounded by two maxima cannot 
be obtained by a variation of parameters in a simple function (the way it is done in the 
mean-field description of a phase transition near the critical temperature). The numerical 
data creates an impression of two different functions glued along l=\. 

The areas Ai = dqp(£, q) under fixed-£ sections are shown in Fig. ||. For £ > ~ it will 
be proven in Sec. [IV] that Ae ~ const / \/l — £; Fig. |5|b demonstrates the numerical validity 
of this relation — Ai\/\ — £ remains approximately constant in the range of validity. The 
accuracy of small £ regime is rather low; we only note that A^ is approximately linear in £ 
for 0.15 < £ < 0.5, as can be seen from Fig. [|a. 



III. QUALITATIVE ARGUMENTS 



In this section we present approximate derivations of several features of p(£, q) . Despite 
the approximate nature of the arguments, they are rather intuitive, and will be useful when 
we generalize the problem to <i-dimensional RWs. 

Most properties of RWs have simple continuum limits. As an example, let us consider 
the special case L = N of our probability distribution: The probability Pn{L = N, Q) that 
the largest Q-segment has length N is simply equal to the probability that the overall charge 
Qo of the RS is equal to Q. This probability (for even N + Q Q ) is given by 

AT' 

W N(Qo ) . Prob {S „ M = Q, } = r" ^ (3) 



1 exp(-Q2/2iV). 
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Consider a restricted subset of all RSs in Q N which consists only of sequences with total 
charge Q Q . The conditional probability for the largest Q-segment in a sequence selected from 
this subset to have length L will be denoted as Pn(L,Q\Q ). This probability is related to 
P/v(L, Q) by the relation 

P N (L,Q) = J2Pn(L,Q\Q )W n (Q ) . (4) 

Qo 

In the case of Q = 0, i.e. for 0-segments, we note that from the definition it follows that the 
conditional probability is normalized, i.e. J2l Pn(L, 0\Q o ) = 1. We further note that as a 
function of L, the conditional probability is expected to be peaked at value which depends 
on Q . Let us assume for simplicity that the peak is very narrow, i.e. the length of the 
largest 0-segment is uniquely determined by Q a and can be described by a function Q (L). 
Indeed, when Q a w 0, the longest 0-segment typically has L w N, while for very large Q Q , 
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the longest 0-segment must be short. Thus Q {L) is a monotonically decreasing function. 
This approximation is especially reasonable for the extremes £ — > or 1. In that case, 
P N (L,0) « W N (Q„(L)), and thus 



p(£,0)^^W N (Q o (L)) 



dQ 



(5) 



dL 

Standard scaling arguments suggest that for Q Q <C y/N we can relate L m N — aQ' 2 , where 
a is of order unity. This gives Q (L) ^ J (N — L)/a, and finally leads to 




On the other hand, for Q Q ^> y/N, the length of the longest 0-segment will be of order of a 
scale at which the random excursion of the RW becomes comparable to the drift produced 
by Q , i.e., when L 1 / 2 ks (2BY 1 / 2 LQ q /N ', where B is a constant of order unity. Thus, 



Q (L) « N^2B/L and 

Thus, this simple scaling argument correctly reproduces the square-root divergence for £ — > 1, 
and the exp( const /£) singularity for £ — > 0. 

It is useful to consider an alternative derivation of the behavior in £ — > limit, since 
such derivation involves a somewhat different view of the same properties. A RS with an 
extremely short 0-segment must have a strong imbalance between the charges (large Q ), i.e. 
resemble a biased random walk. Consider the probability Zjy(L) = J2l>=o Pn(L', 0) that the 
largest 0-segment in an iV-step sequence does not exceed length L. If L <^ N, this quantity 
can be used to estimate Z 2 n(L) for a sequence twice as long: Two halves of the sequence 
of length 2N must be biased walks with the same direction of bias to prevent creation of 
long loops, which start in one half of sequence and end in the other half. In addition, loops 
longer than L must be absent from each half of the sequence. Thus, Z2n(L) ~ \Z%{L). This 
relation is only approximate since it disregards the correlation between the two halves of the 
sequence close to its middle. (Loops longer than L can begin in one half of the sequence and 
end at the other half; correction for this effect may introduce an L-dependent prefactor.) If 
the continuum limit is well defined, we can express this relation in the form 

r e/2 i / ft \ 2 

j o p(e,0)d£^-Up(£,0)d£\ . (8) 

This relation is satisfied by p(£,0) = (2B / '£ 2 )e' B/t . The approximation casts serious doubts 
on the exact value of the power of £ in the prefactor to the exponential. 

Note that two different derivations of the behavior of p(£, 0) in the £ — > limit produced 
different preexponential powers. As was mentioned in the previous Section, our MC results 
are not accurate enough to distinguish between these predictions. 
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The method of reflections is a standard tool in calculating the behavior of random walkers 
near reflecting or absorbing walls (see Ref. ||). It can be used to calculate various seemingly 
nontrivial probabilities in terms of probabilities that are easily evaluated. One such result, 
which is important for the following discussion, is that the probability for an iV-step RW 
to never return to its starting point is equal to the probability that it reaches its starting 



point exactly at the iVth step |13fl , i.e 



Prob{Si(u) ^ 0, 1 < i < N} = Piob{S N (u) = 0} = W N (0), (9) 

where Wn was defined in Eq. (^|). This relation permits, for instance, an exact solution 
to a simplified version of our problem. In the modified problem, the largest Q-segments 
are selected among those that start from the beginning of the RS, rather than all possible 
starting positions. This modified probability P' N (L,Q) is given by the probability that the 
path u reaches position Q at the Lth step, and that it never again passes through position 
Q until the iVth step. Using Eqs.(^|) and (fj|), we obtain the result (for N, L and Q all even 
or all odd) 



P' N (L,Q) = W L (Q)W N . L (0)=2 



-n 5 (N-L)\ 

[(L - Q)/2]\[(L + Q)/2}\ [(N - L)/2]\[(N - L)/2\\ 



nZoo y ,, 2 , „ exp(-Q 2 /2£). (10) 



Try L(N — L) 

Unfortunately, the search for the longest Q-segment in the RS among all possible starting 
points creates a more complicated problem. However, we similarly expect P' N (L,Q) to 
approach a scaling form when N,L,Q — > oo, while the reduced length i = L/N and the 
reduced charge q = Q/VN are kept constant. In this continuum limit, it the probability 
density is defined analogously with p: p'(£, q) = y[P^(L, Q) + P' N (L + 1, Q)}. In this limit 



Eq. ( |10D reduces to 

p'(£, q) = 1 exp(-g 2 /2£). (11) 

We intuitively expect p and p' to behave similarly, at least in the I — > 1 limit, and indeed in 
that limit p' resembles p (see Eq. (|T^) ) 



IV. EXACT RELATIONS 

The probabilities Pn{L,Q) for different values of N, L and Q satisfy an interesting 
relation, which in the continuum limit becomes an integral expression that relates p(£, q) 
at arbitrary values of I > ^ and q to the values of p(£ = \,q). While such a relation is 
insufficient to completely determine the function p(£,q), it suffices to determine some of its 
important features. In this Section, we derive this relation and explore its consequences. 

We first consider the following sets of random sequences, for N/2 < L < N and arbitrary 

Q: 



S 



Aq = {u G Vt 2 L-N ■ S 2 L-n(u) = Q}, 

Bq = {uj G ^2(n-l) '■ Largest Q-segment in uj has size N — L}, 
Cq = {uj G Qn '■ Largest Q-segment in uj has size L}. 

Aq is the set of all (2L — iV)-step sequences with total displacement (charge) Q. This set 
has 2 2L ~ n W2l-n(Q) elements, where the function W has been defined in Eq. fl3|). The set 
Bq contains all (2N — 2L)-step sequences whose largest Q-segments are exactly half as long 
as the whole sequence. By definition, there are 2 2( - N ~ L * > P 2 (n-l)(N — L,Q) such sequences. 
Finally, Cq is our "target set" which consists of all iV-step sequences whose largest Q- 
segment has length L. This set contains 2 n Pn(L, Q) sequences. We shall use the sequences 
from the A- and B-type sets to construct the sequences of the "target set" : It is possible 
to construct a one-to-one onto mapping 

/ : |J (Bq, x A q „q,) » C Q , (12) 
Q' 

i.e., each sequence in Cq can be uniquely associated with a pair of sequences from Bq, 
and Aq_q, for some value of Q', and vice versa. The mapping / is schematically shown 
in Fig. |6|. Basically, the sequence from Aq_q, is inserted into the sequence from Bq, at 
its midpoint to create a sequence in Cq. After such an insertion we obtain a sequence of 
length 2(iV — L) + 2L — N = N , which contains a segment of charge Q — Q' + Q' = Q of 
length (N — L) + (2L — N) = L. Thus we created an iV-step sequence with a Q-segment 
of size L. From the process of construction it is clear, that this is the largest Q-segment 
in the sequence: if a larger Q-segment had existed in the resulting chain, we could have 
reversed the process by removing a segment of length 2L — N from the center of the chain. 
This would have yielded a 2(iV — L) step chain whose largest (Q — Q')-segment was longer 
than half of its entire length, contradicting the initial assumption regarding the chain from 
the set Bq_q,. The "reversibility" of the process also proves the one-to-one correspondence 
between the sets. It should be stressed, however, that this process requires that the midpoint 
of the resulting iV-step sequence is necessarily included in the largest Q-segment. Thus, 
the proof is valid only for L > N/2. 

Since Aq 1 and Aq 2 are disjoint when Qi ^ Q2, equating the number of elements in the 
domain and range of / gives the identity 

P N (L, Q)=J2 W 2L -n(Q - Q')P 2 (n-l)(N - L, Qf). (13) 
Q' 

Taking the continuum limit of the above equation, we replace the probabilities P by the 
probability density p, and the discrete probability W by its continuum (Gaussian) form 
which follows from Eq. (|3|) and obtain 

p(£, q) = -j= alq'e ^-d p(-,q'), (14) 



y/An(2e-l)(l-e) _' 

where q' = Q'/y/N. Since the equation is linear in the function p, it cannot be used 
to determine proportionality constants. (Since the equation is valid only for £ > |, the 
normalization condition of p cannot be used either.) Eq. (0) expresses an unknown function 
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in an interval of £s via the values of the same unknown function at a particular point l=\. 
Despite these limitations, Eq. ([14]) can be utilized to explain some properties of p(£, q) 
and to extract information using alternative methods, as will be explained below. Before 
proceeding, we note, that in the £ — > \ limit the Gaussian term in the integrand of Eq. (|14"D 



(the exponential term with the prefactor 1/ y27r(2^ — 1)) becomes S(q — q'), and the integral 
relation reduces to identity. 

By integrating both sides of Eq.(|T4D over q, we find a relation between the areas Ag, for 



+oo ^ +oo 

A e = dqp(£,q) = dqp(-,q) , (15) 



which confirms the observation from the MC data that for I > ^, Ag is proportional to 



1 / y/T—i. The relation (|T5|) provides a method for measuring the otherwise unknown pro- 
portionality constant by detailed calculation of probability density at £ = |, i.e. measure- 
ment of Ai/ 2 . 

In the £ — > 1 limit, the variable q' disappears from the exponent in Eq . (|T4|) , and the 
relation reduces to 

p(£ -> 1, g) = Al > 2 e~ q2 l 2 . (16) 



/4tt(1 

This relation both confirms our contention that p(£, 0) has a square-root divergence 

Aj ' \Fk{\ — £) with A = \Ai/2, and demonstrates that the fixed-£ sections of the surface 
in Fig. [| approach a pure Gaussian shape when £ — > 1. 

The proportionality coefficient of the square-root divergence A is simply related sum 
over Q of the probabilities for the largest Q-segment to be exactly half of the length of 
the RS. By complete enumeration we calculated the probabilities P M (M/2, Q) for all Q and 
M < 30, and formed the sums A(M) = | J2q Pm{M/2, Q). (Only even sequence lengths 
M were used.) The sums A(M) converge to A in the M — > oo limit. Fig. [7| depicts the 
sequence of the estimates A(M) plotted versus 1/M. The extrapolation to 1/M — provides 
an estimate A = \A\ji = 1.011 ± 0.001. This result is consistent with the MC estimates 
of A, and has smaller error bars. It is interesting to note, that despite the fact that A is 
almost unity, it is definitely larger than 1. 

Finally, we note that the discrete relation in Eq. (|T3|) can be used to produce exact 
analytical forms for Ppj. Consider cases when L = N — M and M is a small number. 
Eq. ( |i~3"D can be rewritten in the terms of M as follows: 

P N (N -M,Q)=Y, W N -2m(Q - Q')P2m(M, Q'). (17) 
Q' 

Consider a case of, say, M = 2. The function Pa(2, Q') is nonzero only for Q' = 0, ±2, ±4, 
and can be easily found for those cases by examining all random sequences of length 4. The 
function Wjq-iiQ — Q') is known exactly for arbitrary values of iV and Q — Q' . The sum 
over Q' is finite — it contains only 5 terms, and therefore can be performed. As a result 
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we can find an exact expression for Pn{N — 2,Q) for an arbitrary value of N. A Similar 
procedure can be performed for M = 4. Thus, for arbitrarily large (even) iV we get 



P N (N- 2,0) = 2 
P N (N-A,0) = 2 



2-N (N-2)\ 



2 ' 



t _ N (N - 8)! 91iV 2 - 1186iV + 3576 



(JV-4)(JV-6) 



Unfortunately, the expressions become increasingly complex with increasing M, and it is 
not possible to use this method to determine the continuum limit of p(£, q). 

We did not find analogous integral relations for £ < |. Here, the situation is complicated 
by the fact that, in a given sequence, there may be several longest Q-segments that are 
disjoint. 



V. EXTREMAL SEGMENTS AND THE "STAIRCASE PROBLEM" 

In this Section, we define a new problem in the theory of random walks, related to two 
simultaneous walkers, and analyze it detail. We derive the relation between this problem, 
and the problem of extremal segments, and use this relation to investigate the properties of 
p{£,0) in the £ -> 1 limit. 

Consider a random sequence (walk) uj = {qi,q2, ■ ■ ■ An}- It can be graphically repre- 
sented by a plot of Si versus i, where Si represents the total displacement of the zth step 
from the origin of the walk. Let us define the following variables: 

Mi(u) = max {S (u), Si(u),--- S t (uj)} , (18) 
rrii(uj) =min{S (uj),Si(uj),---Si(uj)} . (19) 

The variables Mj and mj represent the maximal and minimal coordinates achieved by the 
random walker up to (and including) ith step. In Fig. [8]a, the dot-dashed and dotted 
lines depict Mj and respectively, corresponding to a RS uj shown above the graph. 
(The corresponding Si is depicted by the solid line.) The variable Mj (mj) is a monotonic 
non-decreasing (non-increasing) function of i which graphically looks like an ascending 
(descending) staircase. One can also view Mj and mj as two walls that contain the entire 
RW. Initially the walls are located at M = m = 0, and they gradually separate from each 
other: whenever the random walker inside reaches a wall and performs an additional step 
in the direction of the wall, it pushes the wall to a new position thus increasing the distance 
between the walls. 

Consider two RSs, uj\ and selected from Q^. We are interested in the probability 

<j) L = Prob(^(cu 2 ) > Miiut), 1 < 1 < L) (20) 

that the path UJ2 remains above the maximum point of u\ that far, for the first L steps. 
The dotted line in Fig. |D| depicts the RS U\, which generates the staircase (solid line) that 
the RS 0J2 is supposed to remain entirely above of. We denote the determination of 0^ 
as the "staircase problem." The dot-dashed line in Fig. |9|a depicts a permitted u>2, while 
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dot-dashed lines in Figs. |9|b and |9|c show examples of forbidden cases. (Analogously, one 
can define of problem of RW staying below mj, and a problem of RW staying either above 
Mi or below i.e. staying outside the walls pushed by the RS Every step of the 
staircase begins when the RS ui\ arrives to that particular maximal value of S for the first 
time. The step ends when the sequence exceeds that value for the first time. The sizes of 
the these steps are independent of each other, and their distribution is given by the first 
arrival time to index 1, i.e., Prob(Size of a step= k) = k~ 1 Prob(Sk = 1) ~ k~ 3 ^ 2 . (For a 
general expression of first arrival times see Ref. 0.) This probability is normalizeable, but 
the mean step size is divergent. 

The probability (f> L of u 2 staying above the staircase after L steps decreases with in- 
creasing L. It is easy to put loose upper and lower bounds to <pi- (i) uj 2 needs to remain 
above the origin up to the Lth step, since Mi(ui) > 0. Thus, 0l decays faster than L -1 / 2 , 
which is the asymptotic behavior of the probability of never returning to the origin given 
by Eq. (|^). (ii) The condition is satisfied if uj\ remains completely below the origin and uo 2 
remains above the origin up to step L. Therefore, 4>l decays slower than L~ x . Given these 
bounds, it is reasonable to expect an asymptotic power law for 4>l'- 

lim (j) L = C^L a -\ (21) 

L — >oo 

where < a < |. We will later argue that a — \. We performed a MC investigation of 
the staircase problem for L ranging from 10 to 40960 and sample sizes of about 3 x 10 5 L 3 / 4 
(yielding approximately 3 x 10 5 survival events for each L), and confirmed this particular 
value of a to within one percent. Fig. 10 shows 4>l x L 3//4 as a function of 1/ log 2 (L). The fact 



that this combination remains independent of L when L — > 00 demonstrates the assumed 
power law. The points on the graph provide successive estimates of the prefactor the 
error bars indicate statistical uncertainties (one standard deviation) for each L. We estimate 
the asymptotic value of the coefficient as = 0.263 ± 0.001. 
A very closely related probability distribution is 

4> L = Prob(^(o; 2 ) > Mi_i(wi), 1 < i < L), (22) 

i.e., this time the two paths are allowed to meet at positions where U\ has reached a new 
maximum. Figs. |^a and ^]c both correspond to the permitted events in the definition of 0^. 
Now let 

h = Prob (Si(u 2 ) > MiM, 1 < i < L - 1; S l (lu 2 ) = S L (ux)) (23) 

denote the probability of such a meeting occurring for the first time at step L. Meeting at the 
Lth step represents an extremely simple event, i.e., despite the fact that we are considering 
the behavior of two random walkers, it is easy to construct all possible cases for short L. 



In Fig. |TT|, the solid and dot-dashed lines represent 00% and u 2 , respectively, for L =1, 3, 
4 and 5. We can see that there is a single possibility for L = 1,3,4 and five possibilities 
for L = 5. (The diagram in the bottom right represents 4 different cases; the dashed lines 
indicate the alternative segments in both uj\ and u 2 -) /l is simply equal to 2~ L (probability 
of a single diagram) multiplied by the number of distinct such diagrams. Since {/j} is a 
rapidly converging series, we can easily evaluate the infinite sum J2i f% to a high accuracy by 
summing the first few terms. (The convergence of the infinite series YU fi can ^> e easily seen 
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from the fact that it is bounded from above by the probability that u\ is at a maximum 
when the two RWs meet for the first time.) We can use the probabilities fa to relate <pt to 
4>l via the following relation: 

L L L-Li 

4>L = <pL+ Ml-Lx + E E hJlvfa-Li-L* + ■■■■ (24) 

Li=l L 1= 1L 2 =1 



Fast decay of ft with increasing L, allows the replacement of 4>t-L 1} 4>l-l 1 -l 2 i • • • in Eq.(p4|) 
by (fit in the L — > oo limit, leading to 

' H = C S <\> L . (25) 



The coefficient Cf can be calculated to high accuracy by summing up the series {/»}. We 
have obtained the value C/ = 1.413 ± 0.005 by extrapolating from finite sums of fi, which 
we have obtained exactly for L up to 18, and up to L = 100 using a Monte Carlo method. 



The results are shown in Fig. |12 



Finally, we are in a position to discuss the connection of the staircase problem to the 
problem of our main interest. For simplicity, let us only consider Pn{L, 0) in the L/N — > 1 
limit and examine all RSs with Sn(w) > whose largest neutral segments are L steps long. 
To construct such a sequence, we can start with a neutral segment u of size L, depicted 
by a solid line in Fig. [TB|. This segment is completed into the iV-step RS by adding pieces 
to its two ends (thick dashed and dotted lines in Fig. in such a way that a larger 
neutral segment is not created. In order to avoid overcounting when there is more than 
one largest neutral segment, we can for example require that the initially selected segment 
is the leftmost of all largest segments. Let U be the size of the piece ujr added to the 
RHS of uq. (The LHS piece Ut will then have length N — L — L'.) To avoid creating a 
larger neutral segment which begins somewhere inside u> and ends somewhere inside ujr, 
the sequence ujr must remain above the staircase generated by the successive maxima of 
Uq, i.e. if the sequence ljr is translated to the beginning of the sequence ojq (as depicted 
by the thin dashed line in Fig. [T3| ) they must satisfy conditions defined in the staircase 
problem. Similar restrictions apply to the segment Ut] however, this time both ujq and ojl 
should be viewed "backwards" (thin dotted line in Fig. 13). (Formally, for any sequence 



lu it is convenient to define a conjugate sequence tu*, which consists of the elements ^ of u 
written in reverse order, as illustrated in the Fig. [|b. The conjugate of a given path can be 
obtained by rotating the original path by 180° around the axis normal to the plane. Thus, 
the staircase conditions have to be satisfied between the sequences ujq and uj* l .) Since ujq is a 
neutral segment, its elements are not completely independent, while our original definition of 
the staircase problem required the presence of two completely random sequences. However, 
when N — L ^ N, the two ends of u can be treated approximately as independent RSs, and 
they become completely independent in the (N — L)/N — > (i.e., i — > 1) limit. Finally, we 
notice that the above requirements were somewhat over-restrictive: we are allowed to create 
neutral segments exactly of length L between Uq and lur, and therefore the probability will 
be described by <pL rather than by (fit- The segment u>t, however, must satisfy probabilities 
described by (pt because we initially required that the neutral segment created by u is the 
leftmost segment in the RS. This yields 
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N-L 

P N {L, 0) = 2 ]T <f> N - L -vW L $)fo, (26) 

L'=0 

the factor 2 coming from RWs with Sn(u) < 0. Finally, taking the sum over L' in the large 
N limit, we obtain (for even L) 



lim P(LQ)- 2C *° f 1 2 I" M ' 
t/™- M ' J (N - L) V2-2« y ^(jv - L) Jo [f(l-f)] 1 " 

r(q)r(q) 2CjC> / 2 
r(2a) (iV — L) 1 / 2-2 " y 7rL(N - L) ' 



(27) 



In the above, T(x) is the gamma (factorial) function. This result has several remarkable 
consequences: First of all, this result suggests that p(£, 0) has a well-behaved continuum limit 
only if a = 1/4. This implies that <pL result we have not yet found in the literature. 

Knowledge of C<f, and Cf now enables an independent calculation of the proportionality 
coefficient A through the relation A = V^CjC f [T (1 / A)] 2 /T (1/2) = 1.025 ±0.015. Although 
it is slightly larger and less accurate, this result is consistent with other estimates of A. 



VI. HIGHER DIMENSIONS 

The fact that p(£,0) has a singularity at £ = 1 is a consequence of the fact that a RW in 
one dimension returns to its starting position very often. Thus, it is clear that the behavior 
of p(£, 0) depends strongly on the dimensionality of the RW. In order to investigate this, 
we have generalized the original problem to RWs on a <i-dimensional hypercubic lattice. 
Now the "elementary charge" (scalar) of the one dimensional problem is replaced by an 
elementary step (vector) between neighboring sites on that lattice along one of 2d possible 
directions, and there are (2d) N possible iV-step walks. (We cannot use the analogy with 
the sequence of charges, anymore.) The probability distribution P^(L,Q) can be easily 
generalized: 

P N (L,Q)^Pj? (L,Q), 
p(£,q)^p {d) (£,q), 

where Q = (Qi, ■ ■ • , Qd) is now the <i-dimensional displacement of a segment in the RW, 
and q = (diV)- 1/2 Q. 

Most of the arguments used to explore the features of one-dimensional RWs can be 
applied with minor changes to the <i-dimensional walks. As an example, let us consider the 
qualitative derivation of the asymptotic properties of p(£, 0) in the £ — > 1 limit as derived 
for the d = 1 case in Sec. |TJ: As in the one-dimensional case we may assume that the 
length of the longest loop can be approximately thought of as a function of the overall 
displacement Q G (end-to-end vector) of the entire walk. Under such assumption we expect 
L ~ N — a|Q| 2 , which is analogous to the one-dimensional case, except for the overall charge 
Q that is replaced by the modulus (length) of the vector Q G . The generalization of Eq.(||) 
to d-dimensions is 
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N 
~2 



wj?(|Qo(L)|) 



d\Q \ 



dL 



(28) 



where the one-dimensional Wn{Q ) of Eq. (|5[) has been replaced by Wjy (|Q |), which is the 
probability that the length of a <i-dimensional end-to-end vector of an iV-step RW is |Q |. 
Near Q D = this probability is proportional to N^ d ^ 2 \Q \ d ^ 1 . Substituting, this expression 
to Eq.(|28"D and using the relation between L and |Q Q | we find p(£, 0) ~ (1 — t)i d ^l 2 . Thus, 
we expect the probability density to approach a constant in the £ — > 1 limit in d = 2, and 
to decay to zero as \/l — £ in d = 3. 

The relations which have been demonstrated from an approximate argument above can be 
proven exactly by generalizing Eq. (|HD to d dimensions. The generalization is straightforward 
and leads to the form 



V 



1 ( \-£ \ d/2 + r° h 



-q'V2(l-0 



2(2£-l) 



V 



q 



(29) 



Thus, for the 



1 limit we obtain 



Aid) 
^1/2 
27T d/2 



-|q| 2 /2 



(30) 



Fig. [L4| depicts p^(£,0) for d = 1,2 and 3, obtained from MC simulations iV = 1000 with 
samples of 10 8 , 10 6 and 10 6 RWs, respectively. The peak of the distribution shifts towards 
£ = as the dimensionality is increased. Fig. |14] also demonstrates the verification of the 
form p( d >(£ — > 1,0) ~ (1 — £)( d-2 )/ 2 for these dimensions. 

The asymptotic relation described above assumes that A^fj 2 does not vanish. Note that 



.4 



id) 
1/2 



lim i: yp d ) 

^00 2 ^ 



(JV/2.Q). 



For each sequence u, there are at most N nonzero terms in the summation over Q, and 



Pjy (L, Q) < 1 since it is a probability. Thus, A{L < limAr^oo A^ 2 d l 2 . This implies that in 
dimensions higher than 4, p( d \£, q) = for /> 1/2. It is easy to understand why d = 4 is a 
special dimension in this problem: It is known from the study of the self-avoiding random 
walks that large loops are absent in space dimensions d > 4. Thus we expect that in terms 
of the reduced variable £, all loops will have reduced "length" £ = 0, i.e., p^(l, 0) = 5(£) in 
this regime. 

While we expect an asymptotic probability density 5(£) for d > 4, it should be noted that 
for finite N the probability Pn(L, 0) is a monotonically increasing function of L for small 
values of L. Therefore, the probability density p(£, 0) measured for finite N will have a peak 
at finite (small) value of £. As increases the entire distribution should drift towards £ = 0. 
Fig. |15| depicts such a trend for d — 5. A convenient measure of such behavior is calculation 
of value of L such that most of the statistical weight corresponds to loops shorter than the 
threshold value. We verified the approach of the distribution to a 5-function by examining 



P^(L,0) 



0.9. 



the finite size scaling of the 90% threshold L^(N), defined through J2l 
This means that 10% of the time, there is a loop larger than L^(N) in a <i-dimensional RW 
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of size N. We examined the cases d = 5, 6, and 7 using the MC method described in the 
Appendix, for iV ranging from 14 to 896 and sample sizes of 10 6 . The threshold lengths 
are also shown in Fig. [TJ. We find that L* d (N) ~ iV 1- ^, where /3 5 ps 0.27, /3 6 0.44 and 
/3 7 ~ 0.55. Since the exponents (3d are positive, the threshold in the terms of the reduced 
variable £jJ(iV) ~ iV -/3d vanishes with increasing N. 

The above arguments do not provide a definite answer for the borderline dimension 
of d — 4. (The reader is reminded that the self-avoiding walk problem at the critical 
dimension d — 4 slightly differs from the d > 4 cases: e.g., regular power laws are modified 
by logarithmic corrections.) Through MC calculations with up to 10 7 RW samples and 
values of N up to 5000, we find that in d — 4 the entire distribution p( d \i, 0) can be fitted 
very well to the form £- a ^e~ a ' 2 ^ e , for finite values of N. Fig. ^ depicts such curve for N = 
1000. (The sample size is 10 7 .) The peak position a^/ai approaches either logarithmically 
(a 2 /ai ~ 1/lniV), or with a very small power of N, i.e. a^ja-x ~ N^^ 4 where (3^ ~ 0.16. 
Thus, the distribution still converges to a delta function in the continuum limit. Although 
the qualitative behavior of (£, 0) is easily understood, it would be interesting to obtain 
a quantitative understanding of the distribution, especially at the borderline dimension of 
four. 



VII. DISCUSSION 

The problem of extremal segments originated from the desire to consider a simplified 
description of the ground states of randomly charged polymers. We used MC, exact enumer- 
ation and analytical techniques to analyze the problem, and our results provide convenient 
tools for a semi-quantitative analysis of the the ground states of PAs. In particular, we show 
that a "typical" RS contains very large neutral segments, i.e. it is possible to construct a 
ground state from a single very large blob with relatively short ends of the chain dangling 
outside the blob. 

Besides the original motivation, the problem of extremal segments is interesting in its own 
right. It looks like one of the classical problems of random walks and, nevertheless, is highly 
non-trivial, and the results indicate a solution with very rich and unexpected structure. 
The problem can be related to other interesting problems of the RWs, such as the "staircase 
problem." While several features of the problem have been established analytically, we did 
not find a complete analytical solution of the problem. We think that such a solution is 
possible and further attempts of finding it are worthwhile. Generalization of the problem 
to arbitrary space dimension d is not related to the original problem of charged polymers, 
nevertheless interesting in its own right. 

The numerical "proof" of the continuum limit in our work was limited to a particular class 
of RWs, in which a unit displacement appears at each step. Within that class we presented 
evidence of a continuum limit where the properly scaled functions become independent of 
N. It would be interesting to perform a numerical test of the "universality" of the solution 
for a broader class of RWs. It may be possible to prove the universality of the continuum 
limit by attempting to perform a renormalization-group-like treatment of the problem, i.e. 
attempting to define the problem in the limit where the RW becomes a true Gaussian walk 
(walk of idealized Brownian particle). This limit, however, is far from being trivial. In 
particular the definition of what is called a loop (i.e. how close two different points of the 
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walk should be located so that the segment will be called a closed loop) presents a non- 
trivial problem in the continuum limit. Such short distance scale can undergo a non-trivial 
scaling, similarly to the excluded volume parameter in the treatment of self-avoiding walks. 
A different approach to the question of universality may begin from an expansion of the 
solution near the dimension d — 4, as in the treatment of self-avoiding walks. 
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APPENDIX: NUMERICAL METHODS 

In this appendix, we describe the numerical methods used in our study. All of the 
algorithms were implemented on a Silicon Graphics R4000 workstation. 

We use two approaches to attack the problem numerically: The first approach is to 
compute the exact distribution P N (L,Q) for small values of N by considering all possible 
A-step walks. Since the computational time increases exponentially with A, this method 
practical only up to A « 30, and we have analyzed RWs with up to 36 steps this way. Thus, 
in order to determine the scaling form p(£,q), it is necessary to use a random sampling 
of the set for large values of A. Using such a Monte Carlo (MC) procedure, we have 
investigated RWs of up to 1024 steps. Since the q = case is especially interesting, we 
have used more efficient algorithms to determine p(l, 0) to a higher accuracy. For both the 
exact enumeration and MC calculations, our algorithms require O(A) operations to process 
one sample from f2jv for p(l,0), and 0(A 3 / 2 ) operations to process the full probability 
distribution p(l,q). Further details on the individual algorithms, as well as the algorithm 
used to determine p^ (£, 0) are given below. 

1. Algorithm for p(£, 0) 

The only difference between exact enumeration and MC algorithms involve the number of 
RWs analyzed: In exact enumeration, the number of analyzed RWs increases exponentially 
with A, whereas the samples are chosen at random in the MC routines, and the sample size 
is usually set to a constant. Standard random number generators are used to generate the 
RWs in the MC algorithm. For each RW, the size of the largest loop is determined and this 
is recorded in a histogram (with sizes from to L) that eventually represents the probability 
distribution we are looking for. The determination of the largest neutral segment in a given 
sequence is identical in both enumeration and MC algorithms, and is described below. 

Given a RW u, an array F(Q) stores the step number % when Si(u) = Q for the first time. 
Initially, F(Q) = —1 for all Q. At each step of the RW (including step 0), the current step 
number % is recorded in F(Si(u)) if the site is visited for the first time, i.e. if F(Si(u)) = —1. 
If the site was visited earlier, the maximum loop size is replaced by the maximum of itself 
and the difference % — F(Si). Since F(Si) stores the first time a site is visited, the largest 
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loop in the walk must correspond to one of such differences. A finite number of operations 
are needed for each step, therefore this part of the algorithm involves O(N) operations. 



2. Algorithm for p(£, q) 



The selection of RWs (enumeration or MC) and the creation of the histogram are also 
straightforward for this more general problem. The main task is to find an efficient algo- 
rithm that produces the sizes of largest Q-segments (for all Q) in a given sequence uj. A 
straightforward generalization of the algorithm for p(£,0) would have required 0(N 2 ) op- 
erations per sequence. However, our algorithm takes advantage of the fact that the same 
positions are visited many times, and it requires only 0(iV 3 / 2 ) operations instead. As usual, 
the algorithm traces the sequence one by one. There are two main arrays. At a given step i, 
one of them keeps track of the sizes of largest Q-segments encountered that far. The second 
array is actually a dynamically allocated list of pairs of integers. Each pair in the list stores 
a charge q and size of the largest g-segment that ends at the current step i. The size of this 
array grows as y/i on the average. At each increment in step size, all pairs in the list are up- 
dated by adding the next element in the sequence to q and incrementing the corresponding 
lengths by one. These lengths are then compared with the corresponding values in the first 
array, which is updated if the new length is larger. A new element is added to the list of 
pairs whenever the walk reaches a position for the first time, a condition that is checked for 
separately. All the operations in an update can be accomplished by a single pass through 
the list of pairs, thus the whole algorithm requires only 0(N 3 ^ 2 ) operations to complete, as 
mentioned earlier. 



3. Algorithm for pW(£,0) 

For the MC determination of p^(£, 0) at higher dimensions, the O(N) algorithm de- 
scribed in Sec. [A 1| requires 0(N d ) storage elements for the array -F(Q), which quickly 
becomes prohibitive with increasing d. The storage requirement can be reduced to 0(dN) 
by storing the time series of the position Si(u) of the RW instead. However, the simplest 
algorithms would require 0(N 2 ) operations to find the largest 0-segment given such a data 
structure. Note that the typical RW in dimensions d > 2 does not revisit the same site more 
than a few times, and therefore the total number of O-segments in a RW should be only of 
O(N). We have taken advantage of this fact in order to devise an algorithm that requires 
only 0(N log N) operations to do the job. The algorithm is as follows: 

After the position array S(i) is formed, its contents [which are the position vectors 
(Qx, • • • , Qd)] are indexed in lexicographical order. This operation requires only 0(N log N) 
operations, when an efficient sorting algorithm like Heapsort ]l4j is used. All O-segments in 



the sequence start and end at the same position by definition, therefore the two endpoints 
will be adjacent in the lexicographical index. Going through the index sequentially, it is 
then possible to determine the largest of the O-segments in only O(N) operations. The 
extraordinary speedup of this algorithm makes is possible to go up to sample sizes of 10 6 
for 1000-step RWs in 7 dimensions. 
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FIGURES 



FIG. 1. Low-T configuration of a polyampholyte, which resembles a necklace made up of weakly 
charged beads and a highly charged string. 
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S,((D) 




Q = 4 



(0= { ++-++ ++++ ++-+++++ } 

FIG. 2. Example of a RS uj, and the corresponding RW depicted by Si(uj). In this case, the 
longest 0-segments have lengths L = 18 (dotted lines), while the longest 4-segments (dot-dashed 
lines) have lengths L = 22. There are no 8-segments. 
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FIG. 3. Probability density of largest neutral segments as a function of reduced length I = L/N. 
Symbols depict exact enumeration results for N up to 36. In each graph, the solid line shows the 
MC evaluation of p(£, 0) from 10 8 randomly selected sequences of length N = 1000. 
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FIG. 4. Probability density of largest Q-segments as a function of reduced charge q and reduced 
length I. The results have been obtained from MC simulations (see text). 



23 




» 1.4 h 

o 



*~ 1.3 



1.2 



0.4 



0.6 



0.8 



1.0 



/ 



FIG. 5. (a): A plot of the areas computed from the distribution in Fig. ||. (b): Demonstra- 
tion of the relation A e ~ l/y/i - £ for I > 1/2. 
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FIG. 6. Schematic illustration of the mapping /. A pair of sequences from Bqi and Aq-qi are 
combined to form a sequence from Cq. 
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FIG. 9. Illustration of the probabilities 4>l and <pL- Configuration (a) contributes to both, (b) 
to neither, and (c) contributes to 4>l but not 4>l- 
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FIG. 10. Numerical demonstration of the power law relation (pL ~ L 3 / 4 , and the determination 
of the constant C^. 
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1.42 




FIG. 12. The value for Cf = limL^oo(l — J2i fi)^ 1 is calculated by keeping a finite number of 
terms in the series and extrapolating to 1/L = 0. Both exact and Monte Carlo data are shown. 
The MC data is obtained by starting with single ensemble of 10 s RWs, thus the data points are 
not statistically independent. 
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FIG. 13. Construction of a sequence that contributes to Pn(L,0). A neutral segment ujq of 
length L is augmented by two segments lol and wr, such that uir (to^) always stays above the 
maximum of ojq {oJq). ujr is allowed to touch a new maximum oftJo, since this only produces 
neutral segments of length L which are to the right of ojq. 





FIG. 14. Left: The distribution functions p( d \£, 0) in 1, 2 and 3 dimensions. Right: The I -> 1 
limit of the distributions. 
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FIG. 15. Left: The distribution function p( 5 \l, 0) approaches a delta function with increasing 
N. Right: The 90% threshold L* d {N) scales with the RW size N, the slope in the log-log plot gives 
P d for d = 5,6, 7. 




FIG. 16. Left: The distribution function p^(£,0) is fitted very well with a function of the 
form £~ ai exp(— ai/i). Right: (3± is determined from the finite size scaling of the peak positions as 
approximately 0.16. 
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